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[57] ABSTRACT 

In a con^uter program, a teanch instruction selects a 
prediction heuristic firom a plurality of prediction heuristics 
for predicting whether the brandi will be taken during 
execution of the program by a concpitex. A cunent pattern 
comprises a number of consecutive identical branch deci- 
sions for the instraction. A prior pattern con^ses a number 
of consecutive identical priOT branch decisions for the 
instruction, the prior branch decisions occurring prior to the 
branch dedsioiis comprised by the current pattern. The 
selected prediction heuristic generates a branch prediction 
using the current pattern and the prior pattern. The selected 
prediction heuristic is identified by adding profiling instruc- 
tions to the program to conqHite history information for the 
branch instructioru The profiling instructions input the 
branch history information to a plurality of prediction 
heuristics, and each prediction heuristic outputs a prediction 
of whether the branch instruction will be taken. The program 
is executed with a sample data set, and the ou^ut of each 
prediction heuristic is coiiq>ared to the branch decision for 
the instruction to identify which heuristic most accurately 
predicts the branch decision for the branch instruction. 

54 Claims, 7 Drawing Sheets 
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BRANCH PREDICTOR USING MULTIPLE 

PREDICnON HEURISTICS AND A 
HEURISTIC IDENTIFIER IN THE BRANCH 
INSTRUCTION 

BACKGROUND 

1. Fidd of the Invention 

The invention relates to methods and circuits for branch 
prediction in microprocessors. More spedfically, the inven- 
tion relates to predicting whether a branch wiU be Taken 
using multiple local and global prediction heuristics. 

2. Art Background 

Pipelining is a proven method for enhancing the perfor- 
mance of the central processing unit (CPU) in a digital 
coi]:^)utcx. In a pipelined CPU, inult^)le functional units 
simultaneously execute nuiltiple instructions from a com- 
puter program, leading to substantial performance increases. 

A pipelined CPU operates most efficiently when the 
instructions are ^ecuted in the sequence in which they 
£^ear in memory. Unfortunately, branch instructions con- 
stitute a large pcalion of the executed instructions in a 
computer program. When a branch instruction is executed, 
execution continues either with the next sequential 
instruction, or junq>s to an instruction at a specified "target" 
address. The branch specified by the instruction is said to be 
'Taken" if execution jumps, or **Not Taken" if execution 
continues with the next sequential instruction in memory. 

A branch instruction is either unconditional, meaning the 
ta^ch is taken every time the instruction is executed, or 
conditional, meaning tiie branch is taken or not depending 
upon a condition. The instructions to execute following a 
conditional brandi are not known with certainty until the 
condition upon which the branch depends is resolved. 
lYefetching and executing the instructions at the targ^ 
address of the branch can lead to a significant performance 
hit when tiie branch is Not Taken. Branches may also be 
'forward**, meaning the target address is greater than the 
instruction pointer (IP), or "backward", meaning the target 
address is less than the instruction pointer. 

To compensate for the execution uncertainty caused by 
conditional branches, advanced pipelined CPUs employ 
'branch prediction". Brandi prediction predicts the outcome 
of each conditional brandi instruction in the program before 
the instruction is executed. If the branch is predicted as 
Taken, the processor fetches and executes instructions 
beginning at the target address of the branch. If the branch 
is predicted Not Taken, execution continues at the next 
instruction after the brandi instruction. 

When a branch prediction is incorrect, any fetdied and 
partially executed instructions resulting fixim the incorrect 
prediction must be flushed from the pipeline. Even a pre- 
diction miss rate of 5 percent results in a substantial loss in 
performance due to flic number of instructions inconectiy 
fetched/partially executed in reliance on the wrong predic- 
tions. Further delays are incurred while the processor fetches 
the correct insuiictions to execute following the branch. 

As the instruction issue rate and pipeline depth of pro- 
cessors increases, the accuracy of branch prediction 
beconoes an increasingly significant factor in performance. 
Many schemes have been developed for improving the 
accuracy of brandi predictions. These schemes may be 
dassified broadly as either static or dynamic. Static schemes 
use branch opcode information and profiling statistics from 
executions of the program to make predictions. Static pre- 
diction schemes may be as simple as predicting that all 
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branches are Not Taken or predicting that all branches arc 
Taken. Predicting that all branches are Taken can achieve 
approximately 68 percent prediction accuracy as reported by 
Lee and Smith (J. Lee and A. J. Smitii, '^Branch Prediction 
5 Strategies and Branch Target Buffer Design", IEEE 
Computer, (January 1984), pp.6-22). Another static sdieme 
predicts tiiat ceitain types of branches (for example, jun^)- 
on-zero instructions) will always be Taken or Not Taken. 
Static schemes may also be based upon die direction of tiie 
branch, as in '*if the branch is backward, predict Taken, if 
forward, predict Not Taken". This latter scheme is effective 
for loop intensive code, but does not work weU for programs 
where the branch behavior is irregular. 
One method of static prediction involves stOTing a 

j5 "branch bias" bit with each branch instruction. When the 
instmction is decoded, the "branch bias" bit is used to 
predict whether the brandi is Taken or not The bias bit is 
usually determined statistically by profiling the program 
with san^}le data sets, prior to execution. A profiling method 

20 is used to generate the branch bias bit First the program is 
loaded into the coniputer memasy. Starting with the first 
instruction in the program, a branch instruction is located. 
Instmctions are added to the program to record branch 
decisions for the instruction. The program is then executed 

25 with a number of sanq>le data sets. Execution is stopped, and 
beginning with the first instruction in the program each 
branch instruction is located. The profiling instructions are 
removed from die program, and if the probability that the 
branch will be Taken exceeds 50%, then the branch bias bit 

3Q is set in the branch instruction and saved with the program. 
When die program is next executed, the bias bit is exanoined. 
If set, die branch is always predicted as Taken during 
execution of the program. Otherwise, the branch is always 
predicted as Not Taken. 

35 A disadvantage of all static prediction schemes is that they 
ignore brandi behavior in the currently executing progranx 
By contrast, dynamic prediction schemes examine the cur- 
rent execution history of one or more branch instructions 
when making predictions. Dynamic prediction can be as 

40 simple as recording the last execution of a branch instruction 
and predicting the branch will behave the same way the next 
time. More sophisticated dynamic predictors examine the 
execution history of a plurality of branch instructions. 
Dynamic prediction typically requires more hardware than 

45 static prediction because of the additional run-time compu- 
tation required. 

In dynamic prediction, branch history information is 
applied to an heuristic algorithm. The heuristic algorithm 
inputs the branch execution history and ou^uts an indication 

50 of whetiier the branch will be taken or Not Taken the next 
time it is executed. An example of a heuristic algorithm is 
one which coimts the number of Taken and Not Taken 
dedsions in the last M branch decisions. If the number of 
Taken dedsions equals or exceeds the number of Not-Taken 

35 dedsions, the branch is predicted as Taken. 

Dynamic prediction schemes may be further classified 
into local and global prediction schemes. Local prediction 
schemes depend entirdy on the "self-history" of the branch- 
to predict Specifically, local prediction schemes depend 

60 exdusively on the execution history of the brandi under 
consideration, conq)letely ignoring the execution history of 
other branches in the program. Local prediction schemes 
work well for scientific/engineering applications where pro- 
gram execution is dominated by inner-loops. However, in 

65 programs where control-flows are complex, the outcome of 
a branch is often affected by the outcomes of other, recentiy 
executed branches. Because of fliis corrdation, the local 
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history of a branck, considered alone, is not an accurate used to predict separate branches. The scheme is relatively 

indicator of future branch behavior. Studies have shown that expensive in terms of hardware required because every 

branch correlation is traceable to high-level language con- branch instruction with an entry in the table requires a 

structs (S. T. Pan, K. So, J. T. Rahmeh, 'In^oving the history sequence and a table of counters. This scheme is also 

Accuracy of Dynamic Brandi Prediction Using Branch 5 less accurate for branches with execution behavicffs highly 

Correlation**, Proceedings of the 5th International Confer- correlated to the behavior of other branches, 

cnce on Architectural Support for Programming Languages A similar counter-based prediction scheme uses global 

and Operating Systems, Oct. 1992). branch histcay instead of a local branch history. The history 

One method of local branch prediction uses a history table of all recently executed branches in the program is recorded 

to record history information for a branch instruction. N bits jq in a single N-bit shift register. As each branch in the program 

of the instruction address are used to index an entry in the is Taken or not, the high-order bit is shifted out of the 

history table, where N is typically less than the number of register, and the branch decision is shifted in. The contents 

bits in the brand) instruction address. Because N is less than of the register represent the global branch history of tiie 

the number of bits in the branch instruction address, the program after several branch instructions are executed in the 

history table serves as a hash table fcH^ all possible branch program. The contents of register is used to address a 

instructions in a program. Each entry of the history table particularentryintheglobalhranchhistory table. Each entry 

stores the address of the branch for which the information in of the table contains a saturating up-down counter. The 

the entry is current Storing the branch address in the entry selected counter counts recurrences of the global history 

makes it possible to detect hash-collisions when the address sequence currently recorded in the register. A prediction is 

of a branch instruction does not match the address of the made by a prediction heuristic which inputs the count, in a 

instruction for which the history information in an entry is 20 manner identical to that described for the local predictor, 

current. The count for the global sequence in tiie register is also 

Each entry of the history table also contains an L bit updated in a manner similar to the manner described for the 

branch sequence for a branch instruction, where L is a local predictor. 

number of prior branch decisions to record for the branch. Global prediction schemes are more accurate for branches 

The L-bit branch sequence records whether the last L 25 with execution behavicrs correlated to the behavior of other 

executions of the branch instruction resulted in the branch branches. Furthermore^ global prediction schemes typically 

being Taken or Not-Taken. For example, if 1j=2 and the last require less hardware than purely local schemes because 

two executions of the branch resulted in a Taken and a only one shift register is required, and only one count per 

Not-Taken decisions, then the branch sequence is 10, where entry in the global histOTy table is required, 

logical one (1) represents the Taken decision and logical 30 In either the local or global branch predictors just 

zero (0) represents the Not-Taken decision. Each entry in the described, one simple heuristic for predicting the next 

table also contains an array of 2^ saturating up-down branch is '*if the current input history pattern results in a 

counters. For L=2, each entry also contains four saturating Taken decision for the next branch instruction more than half 

up-down counters, one counter for each of the four possible the time, predict Taken". For example, if a particular execu- 

branch sequences. The possible sequences are: <Not-Taken, 35 tion pattern repeated six times, and four or more of those 

Not-Taken>, <Not-Taken, Takcn>, <Taken-Not-Taken>, and brmch decision was Taken, then the heuristic 

<a'aken, Taken>. In binary, these sequences are 00, 01, 10, ^^''^^^^^^'^ Otherwise the heuristic would predict 

and 11. Each counter counts the number of times a particular -"^ 01 1 aKen. 

branchsequenceresultsinaTakendedsionwhcnthebranch ^ purely locad predictor requires recording die histoy 

is next executed. For example, counter 0 records tiie number 40 ^11^^ ^"""^ ^"^5^ instruction separately (subject to 

of times the sequence OOresults in a branch decision of hashkg). A purely local predictor also reqmres an array of 

"\ . ^^^r; , , Z/ZT • ^«^vu counters for each branch instruction. A purely global pre- 

Taken when the branch mstruction is next executed. ... - 1 i.-** • * * J ^ i.- f 

Lor^^u yvkL^u ^uu^.x vm * v.^ dictoT uscs 3 smgle shift regtstcr to recQTd thc brattch histoTy 

To iffedict whether a branch will be Taken or Not Taken ^ instructions in the program. Hybrids of 

upon the next execution of die branch instruction, the count local and global predictors may be employed, reflecting 

associated with the branch sequence for the instruction is 45 both local and global branch characteristics. One hybrid 

examined by die prediction heuristic logic. A typical heu- predictor uses multiple shift registers. The history of mul- 

ristic works as follows: if the count is greater than or equal tiple branch instructions from a set of branch instructions in 

to a predetermined threshold value, the branch is predicted the program is recorded in each register. This hybrid scheme 

Taken, otherwise the brandi is predicted Not Taken. If the reflects correlation's between branches in a set, without 

count has P bits of resolution, a typical threshold value is 50 entirely obscuring die local behavior of branches in the set 

2^*"^, which is the midpoint of the range of a P-bit counter. Prediction is done independently for eadi set of instructions, 

Once the branch is executed, resulting in a branch decision, using the N-bit counter scheme discussed earlier. A se|>arate 

the branch decision is input to the history update logic. If the table of counters is maintained for each set of instructions, 

branch is Taken, the count for the branch sequence is A branch instruction is assigned to a set according to L bits 

incremented by one. Otherwise the count is decremented by 55 of the branch instruction address. Fcr exanq>le, if the branch 

one. If the count reaches 2''-l (i.e. the counter is saturated), instructions in a program are divided into eigjit sets, then 

the count remains at that value as long as the branch is l^ken three bits of the brandi instruction address are used to select 

on subsequent executions for the same history sequence. If a set, and the histories of all branch instructions with the 

die count reaches 0, it remains at zero as long as the branch same L bits in their address are superimposed into one 

is Not Taken on subsequent executions for the same history 60 histay register. The same L bits of the instruction address 

sequence. Once the count is updated, the branch sequence is are used to select the count a branch history table coire- 

updated with the result of the branch dedsion. The high- spending to the set to which die branch bdongs. The count 

order bit is shifted out of the branch sequence, and the result is input to a prediction heuristic, and a prediction is made, 

of the branch dedsion is shifted in. If the branch is Taken, The branch instruction is then executed, resulting in a branch 

a 1 is shifted in, otherwise a 0 is shifted in 65 dedsion. The L bits of the instruction address are used to 

Because each branch instruction has its own array of select which branch history register is updated by the branch 

counters, no corrdation occurs between the infcnmation dedsion. 
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Dynamic prediction schemes may involve single or mul- 
tq)le heuristics. The prediction schemes described above are 
single-heuristic schemes. Conventional multi-heuiistic 
schemes inpnt history of a branch instruction to multiple 
heuristics, then select the "best** prediction for the branch 
from among the heuristic ou^uts. One method of selecting 
the "best** prediction is to elect the majority decision of the 
multiple heuristics. For example, the branch history infor- 
mation may be input to three heuristics. If a majority of the 
three heuristics elect the branch Taken, then it is predicted 
Taken by majority circuit. Otherwise the branch is predicted 
Not Taken. Multi-heuristic ^>proadies are generally more 
accurate than single-heuristic approaches, but t>^cally 
require more hardware to inq)lement and are thus more 
expensive. 

SUMMARY OF THE INVENTION 

The present invention is a method and apparatus for 
improving the accuracy of branch predictions during the 
execution of a computer program. Prior art methods work 
well for predicting most brandi instructions in a con^uter 
program, but a few branches in the program defy accurate 
prediction using the prior art methods. The present invention 
improves the prediction accuracy of all branches, by using 
the branch instruction to select a prediction heuristic which 
works best for the instruction. When the program is 
executed, a current pattern is generated comprising a number 
of consecutive identical branch decisions for the branch 
instruction. A prior pattern is generated comprising a num- 
ber of consecutive identical prior branch decisions for the 
instruction, where the prior decisions occur prior to the 
branch decisions comprised by the current pattern. Hie 
current pattern and the prior pattern are input to the predic- 
tion heuristic selected by the branch instmction, and the 
oa^ut of this heuristic becomes die prediction of whether 
the branch will be Taken. 

The prediction heuristic selected by a branch instmction 
is determined by adding profiling instructions to the program 
to cotnpute history information for the branch instruction. 
Hie profiling instructions input the branch history informa- 
tion to a plurality of prediction heuristics. Each prediction 
heuristic outputs a prediction of whether the branch instruc- 
tion will be Taken. The program is executed widi a sample 
data set, and the ou^ut of each prediction heuristic is 
compared to the branch decision for the instruction to 
identify which heuristic most accurately predicts the branch 
decision for the branch instruction. 

BRIEF DESaUFnON OF THE DRAWINGS 

HG. 1 is a block diagram illustration of one embodiment 
of the system of the invention, tiiat provides a prediction 
scheme containing multiple heuristics and a branch instruc- 
tion identifying the heuristic to use. 

FIG. 2 is a flow diart illustrating one embodiment of a 
method for predicting whedier a branch will be Taken. 

FIG. 3 is a block diagram illustration of one embodiment 
of a method of identifying the heuristic to use for a branch 
instruction. 

FIG. 4 is a block diagram illustration of one embodiment 
of a system which combines local and global heuristics. 

FIG. 5 illustrates exemplary circuitry for predicting 
whedier a branch will be taken using multiple local heuris- 
tics. 

FIG. 6 illustrates one example of circuitry for updating 
branch history information. 
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FIG. 7 is a flow diagram illustrating one embodiment of 
a method of updating the prior patterns and counts for a 
branch instruction. 

5 DETAILED DESCRIFnON 

The present invention improves branch prediction by 
identifying tiie best prediction heuristic from a plurality of 
heuristics to use for each branch instruction in a computer 
progranL In the preferred embodiment, the best heuristic for 
a branch instruction is identified before normal execution of 
the program, although alternate embodiments are possible in 
which the best heuristic is identified during the initial phases 
of normal execution. Branch history information is input to 
the identified heuristic to predict whether the branch will be 

^ taken. In one enibodiment, the branch history coniprises a 
cunrent pattern, a prior pattern, and a pattern count A field 
in the branch instruction identifies the prediction heuristic to 
use for the instruction, and the output of the identified 
heuristic becomes the prediction of whether the branch will 
be Taken. The present invention in^oves the accuracy of 
branch prediction for all branches in a computer program, 
and in particular improves the accuracy of a few branches 
which defy conventional prediction sdiemes. 

25 In the present invention, branch history information is 
kept in the form of patterns. A pattern identifies one or more 
consecutive identical branch decisions for the instruction. 
For c^an^le, assume that a branch instmction is executed 
four times. On each of the first three executions the branch 
is Not-Taken, and then it is Taken on the fourth execution. 
The pattern 0001 is formed, where a binary zero (0) repre- 
sents a Not-l^n decision for the branch, and a binary one 
(1) represents a Taken decision. Alternatively, the same 
sequence of branch decisions could produce the pattern 

35 1110, where a binary one (1) represents a Not-Taken 
decision, and a binary zero (0) represents a l^iken decision. 
In both cases, the pattern can be represented by the number 
three (3), which is the number of consecutive identical 
branch decisions which created the pattern. Alternatively, 
the pattern can be represented by the number four (4), which 
is the total number of bits in the pattm. The invention is not 
limited to a particular encoding mediod for the pattern. 
Other encoding methods are possible, with the above 
examples intended to merely illustrate the possible encod- 

^5 ings. Examples in this disclosure identify a taken decision 
widi a binary one (1) and a not-taken decision with a binary 
zero (0). 

FIG. 1 shows one embodiment of the present invention. In 
this embodiment, several bits of the branch instruction 

50 address 101 are used to index into a branch history table 102. 
Each entry 103 in the branch history table contains the 
current pattern for a particular branch instruction, an array of 
one or more prior patterns, and an array of pattern counts, 
each pattern count corresponding to a prior pattern. The 

ss current pattern identifies a number of most recent consecu- 
tive identical branch decisions for the branch instruction. 
Prior patterns identify sequences of identical consecutive 
branch decisions whidh occurred prior to the current pattern. 
A count is kept of the number of occurrences of each prior 

^ patteriL 

For example, consider a branch with the history sequence 
01000101000. According to this sequence, after eleven 
executions the branch was l^n three times and Not-Taken 
eight times. The current pattern is 000, which is encoded as 
65 the number three (3), representing the last three consecutive 
Not-Taken decisions for the brandi. The prior patterns are, 
from left to right in the sequence, 01, 0001, and 01. The prior 
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pattern 01 occurs twice and the prior pattm 0001 occurs decision based on the least frequently occurring prior branch 

once in the sequence. The prior patterns can be encoded as pattern. Therefor, if the current pattern is 000, and the prior 

<2»2>, <44>, <2^>. The sequence 01 is encoded as the patterns are <2^> and <4,1>, the LLPN predictor predicts a 

number two (2), and it occurs two times, resulting in two next pattern of 0001, because the prior pattern 0001 occurs 

pattern/count pairs of <2,2>. The sequence 0001 is encoded 5 least frequently for this instruction. Because the pattern 

as the number four (4), and it occurs once, resulting in a 0001 is generated by predicting the branch Taken, the 

single pattern/count pair of <4,1>. Scanning the branch MIPN predictor predicts tiie branch Taken in the present 

history sequence from right to left, it is readily apparent that example. When in^jlemented in a pipelined con^uter 

in this embodiment the prior patterns are ordered according processor, the Taken prediction results in the prefetch and 

to their position in the branch history sequence. processing of the program instructions at the target address 

Each entry 103 in the local history table also includes the of the branch instruction, 

branch instruction address (a copy of branch instruction Another heuristic is the most recent pattern next (MRPN) 

address 101) for which pattern information is currently heuristic. This heuristic predicts the next branch decision 

maintained. Because only a subset of the bits from the based on the most recently occurring prior branch pattern, 

branch instruction address 101 are used to address an entry Therefor, if the cuixent pattern is 0, and the prior patterns are 

103mthetable each entry IWmthe^^^^^ <2,2> and <4,1>, the MRPN predictor predicts a next pattern 

tiian one branch instruction address 101 m toe program, is because the prior pattern 01 occurs most recently for 

thus necessan^ to maintem a copy of the branch mstrurtion this instruction (only the ^nt pattern is more recent). Tlxis 

address 101 for which the pattern mformation is current, so . - ^. • *t: * x * ^ j * 

that the entry 103 can be re initialized when a second braich "T^Z^' f"'^"^'^ ''^•r''' ^TZ' '"k 1?'^'' 

instruction address 101 maps to the entry 103 previously 20 are ordered according to their F«smon in the br^^^ 

used by a first branch instruction address 101. When the sequence. Because the pattern 01 is generated by predictmg 

branch instruction address 101 stored in die local history branch Tton, the MRPN predicts predicts the branch 

table 102 does not match the branch instruction address 101 Taken in the present example. When inq)lemented in a 

used to index the entry 103, tiien a hash collision occurs. The pipelined computer processor, the Taken prediction results 

entry 103 is re-initiaHzed and a new copy of the branch 25 in the prefetch and processing of the program instructions at 

instruction address 101 is recorded in die entry 103. tiie taijget address of the branch instruction. 

The current pattern, prior patterns, and counts are input to Another heuristic is the delta plus last pattern next 

a plurality of prediction heuristics 104. In one embodiment, (DPLP) heuristic. This heuristic predicts the next branch 

the outputs of all predictions heuristics are input to a decision based upon the difference between the two most 

multiplexer 105. The field BSHL within the branch instruc- 30 recently occurring prior branch patterns. The difference is 

tion 100 selects an output from the plurality of prediction added to the second-most recentiy occurring prica: pattern to 

heuristic outputs which becomes the prediction 106. compute the next predicted pattern. Therefor, if the current 

Alternately, the current pattern, prior patterns, and counts pattern is 0, and the prior patterns are <2,2>, <4,1>, and 

may be input to only the prediction heuristic identified by the <2^>, the DPUP predictor predicts a next pattern of 000001, 

BSHL field in the branch instruction 101, using the multi- t.e. six (6). This next pattern is predicted as follows: (1) 

plexer 105 at the input of the plurality of prediction heuris- compute the difference between the second prior pattern and 

tics 104. The branch instruction is subsequentiy executed, the first, which yields 4-2=2; and (2) add this difference to 

resulting in a branch decision 108. The branch decision is the second prior pattern, which yields 44-2=6. Thepattem six 

input to history update logic 107, which updates the current (6) will not ocair if the branch is predicted Taken, and 

pattern, prior patterns, and counts according to the branch 40 thexefor the branch is predicted Not-Taken in the present 

decision. In one embodiment, the current pattern is updated example. The DPLP heuristic requires that the array of prior 

first In the present example, the current pattern f<x the patterns and counts are ordered according to their position in 

branch is updated from 000 to 0001. This pattern is checked the branch history sequence. When implemented in a pipe- 

against the prior patterns. The pattern 0001 appeared once lined computer processor, the Taken prediction results in the 

before, and so the count for this pattern is updated from 45 prefetch and processing of the program instructions at the 

one(l) to two(2). The new pattern/count pair becomes target address of the branch instruction. 

<4,2>. The current pattern is reset to zero(0), because die FIG. 2 describes the steps for predicting whether a branch 

sequence of three Not^Taken decisions is interrupted by the will be Taken using the embodiment illustrated by FIG. 1. 

most recent Taken decision. First, the current pattern, prior patterns, and patton counts 

A few of die many types of heuristics that may be utilized 50 for the branch instruction are retrieved from memory 200. 

are described below. It is contemplated that many types of This information is input to nmlt^le prediction heuristics 

heuristics other than the ones described here may be utilized. 201. Each prediction heuristic predicts whether the branch 

One possible heuristic is a Most-Iikely-Pattern-Next will be Taken or Not Taken 202. The output of the prediction 

(MLPN) heuristic. This heuristic predicts the next branch heuristic identified by the BSHL field of the branch instruc- 

decision based upon the most frequentiy occurring prior 55 tion is selected 203. The branch instruction is subsequentiy 

pattern for the instruction. Therefor, if the current pattern is executed to determine a branch decision 204. The current 

0, and the prior patterns are <2,2> and <4,1>, the MLPN pattern is updated according to the branch decision 205, and 

predictor predicts a next pattern of 01, because this prior the updated current pattern is used to update the prior 

pattern occurs most frequently for the instruction (twice). patterns and counts 206. 

Because the pattern 01 is generated by predicting the next 60 In one embodiment, the prediction heuristic identified in 

branch Taken, the MLPN predictor predicts the next branch a brandi instructions field is determined using the method 

T^n in the present example. When implemented in a shown in FIG. 3. First, the computer program containing the 

pipelined computer processor, tfie Taken prediction results branch instructions is loaded from permanent memory into 

in the prefetch and processing of the program instructions at RAM 300. Rrofiling instructions are added to the program at 

the target address of the branch instruction. 65 or near the location of each branch instruction to compute 

Anotiier heuristic is the Least-Likely-Pattem-Next the cuixent pattern, the prior patterns and the pattern counts 

CLLPN) heuristic. This heuristic predicts the next branch for the branch instruction 301. The program is subsequentiy 
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executed with a sample data set 302. Thus, when a branch one skilled in the art that the present invention is not limited 

instruction is encountered during program execution, the to this embodiirient and that different circuitry and logic may 

profile instructions are executed pri<H- to execution of the be used. Referring to FIG. 5, the array of pattern counts for 

branch instruction. The resulting current pattern, prior a branch is input to the MLPN 501 and the LLPN 502 

patterns, and pattern counts are input to a plurality of 5 prediction heuristics, each of whidi outputs an index of the 

prediction heuristics 303, and the branch Instruction is prior pattern to use in the prediction. Tbe MRPN and DELP 

executed 304 to determine a branch decision. The ou^ut of algorithms always output zero as the index of the prior 

eadi prediction heuristic is compared to the branch decision pattern to use in the prediction, because these heuristics 

and the results of the comparison are recorded 305. always use the most recent prior pattern when making the 

Execution is stopped 306, and the profiling instructions prediction. The most recent pattern has an index of zero 

are removed 307 for aU branch instructions. The BSEL field when the patterns are ordaed according to which pattern 

of each branch instruction is then set to identify the predic- occurred most recenfly The indices are input to a multi- 

tion heuristic which most accurately predicted the branch piexer 503 and a selection is made based upon the BSEL 

decision for that mstniction 308. Fmally, the program is field in the instruction. The pattern index and the airay of 

saved from RAM to permanent storage 309. prior patterns are input to an element selector 504, which 

One skilled in the art will appreciate that many variations outputs the prior pattern with the given index. This pattern 

of tht method described in FIG, 3 are possible, without is compared to the maximum pattcxn length allowed, using 

departing from the spirit of the invention. For example, the comparatoa: 507. Because the pattern is represented by a 

insertion of the profiling instructions in step 301 may be saturating counter, the pattan will equal a maximum value 

accon5)lished jmor to loading the computer jffogram into ^ (the saturation value of the counter) when the TnaTiTnum 

memory, f<x exan^jle by a compilCT which compiles the pattern length is met or exceeded. The output of comparator 

program from a high-level language into machine language. 507 is inverted, so that when the selected pattern does not 

Also, tiie ronoval of the profiling instructions and setting of meet or exceed the niayimum length allowed fcr a pattan, 

the BSEL field in the branch instruction to identify the the ou^ut is asserted. The asserted ouQ>ut comprises one 

prediction heuristic which most accurately predicts the 25 input to AND gate 509. 

branch (steps 307-308) may be performed subsequentiy to The prior pattern with the selected index is also input to 

saving the program from RAM to pennanent storage 309, adder 506. Using selector 505, the difference between prior 

for example using a compQer. In an alternative embodiment, pauem 0 and prior pattern 1 is also input to adder 506 if the 

&e steps described in FIG. 3 may be accoii^)Hshed during selected heuristic is DPLP. Otherwise a value of 0 is input 

normal execution of the program for a number of initial 30 to adder 506. Thus if the selected heuristic is DPLP, the 

iterations of the program code. This alternate embodiment difference b^ecn prior pattern 0 and prior pattern 1 is 

has the advantage that no sq)arate execution of the program added to the prior pattern 0 (the selected prior pattern for the 

is required to identify the prediction heuristics of the branch dPUP heuristic). For all other heuristics, 0 is added to the 

instructions encountffcd during execution. A disadvantage is pnor pattern. The output of adder 506 is the predicted next 

that execution efficiency of the program may be adversely 33 pattern. The predicted pattern is compared with tiie next 

affected durmg tiie initial iterations of the program code currentpattanwhidi would result from a Taken decision. If 

while profiling is earned out there is a match, then comparatw 508 ou^ut is asserted. 

In one embodiment, the plurality of local predictors is This asserted input is ii^ut to AND gate 509. If both tiie 

supplenoented by one or more global predictors. Global ou^ut of comparator 508 and the (inverted) ou^ut of 

branch history information is recorded and input to the 40 comparator 507 are asserted, then AND gate 509 output is 

global predictor. The global history information can indude asserted. If tiie address of the branch instruction matches the 

a global histOTy sequence and a table <rf counts, each count address in the look-up table entry used to coiiq>ute the 

indicating the nunaber of times the corresponding global branch decision, then multiplexer 510 selects Taken as the 

sequence has occurred. Additional profiling instructions may next branch prediction. Otherwise, Ijy default. Not Taken is 

be used to further conqjute ttie global histCHy information 45 predicted for forward branches, and Taken is predicted for 

and input this information to a global predictor. The output backward branches. 

of the global predictor is compared with the ou^uts of tiie piG. 6 shows one embodiment of a circuit to update tiie 

local predictors. If tiie global predictor more accurately local branch history. The current pattern 610 is incremented 

predicts whetiier the brandi instruction wiU be Taken, tfien using adder 600 and tiic resuU is input to maximizer circuit 

tiie BSEL field of tiie branch instruction is set to identify tiie 50 601. The second input of maximizcr 601 is the maximum 

global predictor, ottierwise BSEL is set to identify tiie local allowable pattern, in tius example binary 11111, The maxi- 

predictor which most accurately i^edicts tiie branch instruc- mizcr 601 outputs tiie maximum of tiie incremented current 

pattern and tiie maximum value allowed, binary 11111. 

FIG. 4 shows one embodiment in which tiie outputs of tiie Using multiplexer 614 tiie updated current pattern is ii^ut to 

local predictors 401 are input to a multiplexer 405 and, in 55 multiplexer 606 if tiie branch was Not Taken, or reset to 0 

addition, tiie ou^ut of a global predictor 402 is input to a ottierwise. Multiplexer 606 and multiplexer 603 are used to 

multiplexer 405. The BSEL field of tiie instruction 400 reset tiie current pattem when tiiere is an indication 612 tiiat 

selects tiie output of citiier a local predictor 401 or tiie global the branch address does not match tiie address of tiie branch 

predictor 402. depending on which predictor is identified by in tiie history table. When Uie addresses do not match, a 

BSEL. In tills embodiment, local histcay information 403 is 60 hash-collision has occurred and aU history information for 

input to tiie local predictors 401. The local history informa- tiie branch must be reset The branch decision 614 selects 

tion can include ttie output of local history tables. Global eitiia a logical one (1) (if tiie branch was Not T^en) or a 

history information 404 in input to tiie global predictor 402. logical zero (0) (if tiie branch was Taken) as tiie reset value 

The global information can include tiie output of global of tiie current pattem. Multiplexer 604 and 605 select reset 

history tables. 65 ygUxc^ for tiie arrays of counts and prior patterns. When a 

FIG. 5 shows one embodiment of a circuit that imple- hash collision occurs, multqjlexers 607 and 608 select tiie 

mcnts multiple local prediction heuristics. It is apparent to reset values for tiie counts and prior patterns (input 0 of tiie 
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multiplexers). Otherwise, multiplexers 607 and 608 select 
the updated counts and patterns from update logic 602 (input 
1 of the multiplexers). 

FIG. 7 shows one embodiment of the update logic for 
updating the prior pattems and counts . If the branch is taken S 
700, then the counts of all prior patterns which match the 
current pattern are incremented 702. The array of prior 
patterns and counts is left-shifted 704, so that the least- 
recent pattern and count are shifted out of the array. The 
most recent prior pattern is then set to the current pattern lO 
706. The count of the most recent pattern is set to the number 
of occurrences of the most recent pattern in the array of priOT 
pattems 708. 

It is readily apparent from FIGS. 5 and 6 that using the 
present invention, a plurality of predictions heuristics may is 
be implemented with low hardware costs. Hardware costs 
are reduced using the present invention because many of the 
hardware conqwnents for confuting the prediction and 
updating tiie history information are shared among the 
plurality of prediction heuristics, 20 

The spediic arrangements and methods described herein 
are merely illustrative of the principles of the inventioa 
Numerous modifications in form and detail may be made by 
those of ordinary skill in the art without departing from the 
scope of the present invention. For example, the invention 25 
described herein is not limited to any particular coding 
method f GT tiie history pattern information, nor is it limited 
to the specific heuristics described herein. 

Although this invention has been shown in relation to a 
particular embodiment, it should not be considered so lim- 
ited. Rather, the invention is limited only by the scope of the 
appended claims. 

We claim: 

1. A method of predicting whether a branch specified by 
an instruction in a conq>uter program will be taken during 
execution of the program by a computer, the method com- 
prising the steps of: 

providing the instruction from the p-ogram, the instruc- 
tion comprising the brandi specified, the instruction 
configured to cause selection of a prediction heuristic ^ 
from a plurality of distinct prediction heuristics; 

storing in a memory a current pattern comprising a 
number of consecutive identical branch decisions for 
the instiuction; 

storing in the memory a prior pattern comprising a 
number of consecutive identical prior brandi decisions 
for the instruction, the prior branch decisions occurring 
prior to the branch decisions comprised by the current 
pattCTn; ^ 

generating a prediction of whether the branch will be 
taken using the selected prediction heuristic, the 
selected prediction heuristic using the current pattern 
and the prior pattern to generate the prediction. 

2. The method of claim 1 further con^rising the steps of: 33 
executing the instruction to determine a branch decision; 

and 

updating the current pattern and the prior pattern accord- 
ing to the branch decision. 

3. The method of claim 1 further conqjrising the steps of eo 
storing in the memory a pattern count, the pattern coimt 
comprising a count of occurrences of the prior pattern, the 
selected prediction heuristic using the pattern count to 
generate the prediction. 

4. The method of daim 3 further comprising the steps of: 6S 
executing the instruction to determine a branch decision; 

and 



45 



Updating the pattern count according to the branch deci- 
sion. 

5. The method of claim 1 further coniprising the steps of: 
storing in the memory a global sequence, the global 

sequence con^msing a number of branch decisions for 
a plurality of instructions; 
storing in the memory a global count, the global count 
comprising a count of occurrences of the global 
sequence; and 

said step of generating including generating a prediction 
of whether the branch will be taken using a global 
prediction heuristic, the global heuristic using the glo- 
bal count to generate the prediction. 

6. The method of claim 5 further conoprising the steps of: 
executing the instruction to determine a branch decision; 

and 

updating the current pattern, ttxc prior pattern, the global 
sequence, and the global count according to the branch 
decision. 

7. The method of claim 1 further comprising the steps of 
inputting the current pattern and the prior pattern to each 
prediction heuristic of the plurality of prediction heuristics, 
the output of the selected prediction heuristic being the 
prediction of whether the branch will be taken. 

8. The metiiod of claim 1 finther comprising the steps of 
inputting the current pattern and the prior pattern to only die 
selected prediction heuristic, the output of the selected 
prediction heuristic being the prediction of whether the 
branch will be taken. 

9. The method of claim 1 in which the plurality of 
prediction heuristics comprises a most-likdy-pattcm-next 
(MLPN) prediction heuristic. 

10. The method of claim I in which the plurality of 
prediction heuristics comprises a least-Iikely-pattem-next 
(LLPN) prediction heuristic. 

11. The method of claim 1 in which the plurality of 
prediction heuristics comprises a most-recent-pattem-next 
(MRPN) prediction heuristic. 

12. The method of claim 1 in which the plurality of 
prediction heuristics comprises a delta-plus-last-pattem 
(DPLP) prediction heuristic. 

13. Hie method of claim 1 in which the prediction 
heuristic selected by the branch instruction is detennined by 
a method comprising the steps of: 

locating the brandi instruction in a program and adding 
profiling instructions for the branch instcuction to the 
program, the profiling instructions, when executed, 
consulting history information for the branch instruc- 
tion and inputting the history information to a plurality 
of prediction heuristics, each prediction heuristic out- 
putting a prediction of whether the branch instruction 
win be taken; 

executing the program with a sample data set, the, exeoi- 
tion of the program causing execution of the profiling 
instructions, the execution of the program finther caus- 
ing execution of the branch instruction, the execution 
of the branch instruction comprising a branch decision; 
and 

con^aring the output of each prediction heuristic from the 
plurality of prediction heuristics to the brandi decision 
to identify a prediction heuristic from the plurality of 
prediction heuristics which most accurately predicts the 
brandi dedsion for the branch instruction. 

14. A method of determining a prediction heuristic for a 
branch instruction in a conq>uter program, the mediod 
comprising the steps of: 
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locating the branch instructioii in the program and adding 
profiling instnictioDS for the branch instruction to tttc 
program, die profiling instructioas, when executed, 
computing history information for the branch instruc- 
tion and inputting the history information to a plurality 5 
of prediction heuristics, each prediction heuristic out- 
putting a i^ediction of whether the branch instruction 
will be taken; 

executing the program with a sample data set, the execu- 
tion of the program causing execution of the profiling 
instructions, the execution of the program further caus- 
ing execution of the branch instruction, the execution 
of the branch instruction con^nising a branch decision; 
and 

coiE^aring the output of each prediction heuristic firom the 
plurality of prediction heuristics to the branch decision 
to identify a prediction heuristic from the plurality of 
prediction heuristics which most accurately predicts the 
branch decision for the branch instruction. 

15. The method of claim 14 further comprising the steps 

of: 

removing the profiling instructions from the program; and 
adding to the instruction an identification of the prediction 

heuristic which most accurately predicts the brandi 

decision for the branch instruction. 

16. The method of claim 14 in which the plurality of 
prediction heuristics con^inses a most-likely-pattem-next 
(MLPN) prediction heuristic. 

17. The method of claim 14 in which the plurality of 
prediction heuristics con^nises a least-likely-pattem-next 
(LLPN) prediction heuristic. 

18. The method of claim 14 in which the plurality of 
prediction heuristics comprises a most-recent-pattem-next 
(MRPN) prediction heuristic. 

19. The method of claim 14 in which the plurality of 
prediction heuristics conqnises a delta-plus-last-pattem 
(DFLP) prediction heuristic 

20. An apparatus for predicting whether a branch speci- 
fied by an instruction in a computer program will be taken 
during execution of the program by a computer, the instruc- 
tion configured to cause selection of a prediction heuristic 
firom a plurality of distinct heuristics to predict whether the 
branch specified by the instruction will be taken, the fxpp&~ 
ratus con^sing: 

a first storage devipe that stores a current patt^ com- 
prising a number of consecutive identical branch deci- 
sions for the instruction, said storage device also stor- 
ing a prior pattern comprising a number of consecutive 
identical prior branch decisions for the instruction, the 
prior hrandi decisions occurring prior to the branch 
decisions comprised by the current pattern; 

a circuit that includes a plurality of prediction heuristics 
con^)ri5ing the selected prediction heuristic; 

and a circuit configured to use the selected prediction 55 
heuristic using the cuaent pattern and tiie prior pattern 
to generate a prediction of whether the branch will be 
taken. 

21. The qipaiatus of claim 20 further comprising a pattern 
count, the pattern count comjrising a count of occurrences 60 
of the prior pattern, the selected prediction heuristic using 
the pattern count to generate a prediction of whether the 
branch will be taken. 

22. The apparatus of claim 21 in which Uie con^uter 
conqjriscs an execution unit, the execution unit to execute 65 
the instruction and to ou^t a branch decision indicating 
whether the brandi specified by the instruction was taken, 
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the apparatus further con^xrising an update unit to update the 
pattern count according to the brandi decision. 

23. The apparatus of claim 20 in which the computer 
comprises an execution unit, the execution unit to execute 
the instruction and to output a branch decision indicating 
whether the branch specified by the instruction was taken, 
the aj^aratus further comprising an update unit to update the 
cuirent pattern and the prior pattern according to the branch 
decision. 

24. The apparatus of claim 20 fiuther comprising: 

a second storage device that stores a global sequence, the 
global sequence comprising a number of branch deci- 
sions for a plurality of instructions, said second storage 
device also storing a global count, the global count 
comprising a count of occurrences of the global 
sequence; and 

a circuit including a global prediction heuristic using the 
global sequence and the global count wherein said 
circuit configured to use the plurality of prediction 
heuristics to generate a prediction of whether the 
biandi will be taken. 

25. The apparatus of claim 24 in which the conqmter 
con^rises an execution unit, the execution unit to execute 
the instruction and to output a branch decision indicating 
whether the branch specified by the instruction was taken, 
the apparatus further comprising an update unit to update tiie 
global sequence, the global count, the current pattern, and 
the prior pattern according to the branch decision. 

26. The apparatus of claim 20 in which the cuirent pattern 
and the prior pattern are input to each prediction heuristic of 
the plurality of prediction heuristics, the <Hitput of the 
selected prediction heuristic being the prediction of whether 
the branch will be taken. 

27. The apparatus of claim 20 in which the current pattern 
and the prior pattern are ii^ut to only the select prediction 
heuristic, the output of the selected prediction heuristic 
being tbc prediction of whether the branch will be taken. 

28. The apparatus of daim 20 in which the plurality of 
prediction heuristics comprise logic to inclement a most- 
likely-pattcm-next (MLPN) prediction heuristic. 

29. The apparatus of daim 20 in which the plurality of 
prediction heuristics comprise logic to implement a least- 
likely-pattcm-ncxt (LLPN) prediction heuristic 

30. The apparatus of daim 20 in which the plurality of 
prediction heuristics compise logic to implement a most- 
recent-pattem-next (MRPN) prediction heuristic. 

3L The apparatus daim 20 in which the plurality of 
prediction heuristics comprise logic to implement a delta- 
plus-last-pattern (DFLP) prediction heuristic. 

32. The apparatus of claim 20 in which the prediction 
heuristic selected by the branch instruction is determined by 
an apparatus con^sing: 

a profiler to locate the branch instruction in a program, the 
profiler to add profiling instructions for the branch 
instruction to the program, the profiling instmctions, 
when executed, to compute history infcnnation for the 
branch instruction and to input the history information 
to a plurality of prediction heuristics, each prediction 
heuristic to ou^t a prediction of whether die branch 
instruction will be taken; 

an execution unit to execute the program with a sample 
data set, the execution of the program causing execu- 
tion of the profiling instructions, the execution of the 
program further causing execution of the branch 
instruction, the execution of the branch instruction 
comprising a branch dedsion; and 

a conq)arator to compare the ou^ut of each prediction 
heuristic from the plurality of prediction heuristics to 
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the branch decision^ the con^arator to identify a pre- a second storage device that stores a global sequence, the 

diction heuristic from the plurality of prediction heu- global sequence con^sing a nunaber of branch deci- 

ristics which most accurately predlas the branch deci- sions for a plurality of instructions, said second storage 

sion for the branch instruction. device also a global count, the global count comprising 

33. An apparatus to determine a prediction heuristic for a 5 ^ count of occurrences of the global sequence; and 
branch instruction in a computer program, the apparatus a circuit including a global prediction heuristic using the 
comprising: global sequence and the global count to generate a 

a profiler to locate the branch instruction in the program, prediction of whether the branch will be taken, 

the profiler adding profiling instructions for the branch 39. The system of dami 38 further compn^ng an update 

instruction to the program, the profiling instnictioDs, lO to update the global sequence, the global count, the 

when executed, to compute history information for the P^^^ P^° according to the branch 

branch instruction and to input the history information deasion. , , t 

to a plurality of prediction heuristics, each prediction ^ apparatus for predictmg whether a branch speci- 

heuristic to output a prediction of whether the branch ^ mstruction m a computer program will be taken 

instruction wiU be taken; 15 ^unng execution of the program by a con^5uter, the instruc- 

^ , tion configured to cause selection firom a plurality of distinct 

an execution umt to execute the program with a sample ^^^^^ « ^*H^^f^/^„ ^^f«c f/J 

data set, the execution of the pro-am causing exeai- ^^f""^ ^^^^''^^'u "^^f for predirtmg 

IT 7lu ^r: . J" J y'^f^'^ cauomg (.Acwi- ^hetho* the branch specified by the instruction will be taken, 

tion of the profiling mstrucbons, the execution of the ^ vT^ . ^ ' 

program further causing execution of the branch appar s con^nsmg: 

instruction, tiie execution of tiie branch instruction 20 me^ for storing a current pattern compnsing a number 

comprising a branch decision; and consecutive identical branch deasions for tiie 

. ^ , instruction, the means for stoffing further stonng a pnor 

a compawt^ to compare the output of each predtcbon comprising a number of consecutive Wentical 

heunstic frcjn Ae plurality of predirtion heunstics to ^ branch dec^ons for the instruction, the prior 

toe branch dcasion, the comparator to idenb^y a pre- ^^^^ ^ bran^^ J^. 

dicuon heunstic from the plurahty of prediction heu- comprised by the cuLnt pattern; and 

nstics which most accurately predicts the branch dea- . . T j- i i-*, *j 

sion for the branch instruction. ^ ™ including a plurality of prediction means com- 

-XA ^* rvi^i™ « ft^*.-^ pnsmg the selected prediction means, the circuit con- 

34. The apparatus of claim 33 further compnsmg an * ^ ^ t i- ^ j. ^. . 
optimizer to iemove the profiling instnictionVftom the figured to use toe phnali^ of i««hchon means thatuse 
program, the optimizer ftnther to add to the branch instiuo- ^ Z ^ for generating 
tion an identifi^tion of the prediction heuristic which most ^1^' " °1 whetha the branch wiU be taken, 
accurately predicts the b^ch decision for the branch ^V;' The apparams of clam 40 tte means for stmng 
instruction further stonng a pattern count, the pattern count compnsmg 
*°35. Asystemforpiedicting whether a brandi spedfled by " ~"»' occurrences of the prior pattern the selected 
an instruction in a ^.mputcr program will be taten during P"^^"" "^TJ^^i 'f^, ^Tl " 
execution of the progL by a^mputer, the instruction ^^^f!^ branch wiU be taken^ 
configured to cause sdection of a pr«Liction heuristic from ''2. The apparatus of daun 41 m which the computer 
apluiaUlyofdistinctheuristicstopredictwhelherthebnmch """P^^" an execution means, the execution means for 

K„ rt,. ,„:ii iT. executine the mstruction and outonttiag a brandi decision 

specified by the instruction wiJl be taken, the system cwn- .... . . ° , , . j «_ »i. • ^ 

'40 indicating whether the branch speofied by the instruction 

J . L was taken, the apparatus further conmiising update means 

a first Storage device that stores a cuncnt pattern com- the pattern count according to the brandi 

pnsmg a number of consecutive identical branch deci- dedsion 

sions for the instruction, said first storage device also 43 apparatus of claim 40 in whidi flic computer 

stonng a prior pattem compnsing a number of con- co^^ses an «ecution means, the execution meanTfor 

secutive identical pnor branch decisions for tee ^^^^ instruction and outputting a brandi dedsion 

mstru<»on, die ptior brandi deasions occumng pnor indicating whetiier flie brandi spedfied by the instiuction 

to the brandi decisions compnsed by the current pat- ^ comprising update means 

for updating the current patt^ and the prior pattem accord- 

a circuit including a pluraUty of prediction heuristics 50 ing to tiie branch decision, 

conqirising tiie selected prediction heuristic; 44^ xhe ^paratus of daim 40 wherein tiie means for 

a circuit configured to use the plurality of prediction storing further storing a global sequence, the global 

heuristics, that use the current pattem and the prior sequence comprising a number of branch decisions for a 

pattern to generate a prediction of whether the branch plurality of instructions, the means for storing further storing 

will be taken; and 55 a global count, the global count comprising a count of 

an execution unit to execute the instruction, the execution occurrences of the global sequence, the apparatus further 

unit to output a branch decision indicating whetiier the comprising circuit means including g;lobal prediction means 

branch specified by the instruction was taken. using the global sequence and the global count for gener- 

36. The system of claim 35 further comprising a pattem ating a prediction of whether the branch will be taken, 
count, the pattem count con^zrising a count of occurrences 60 45. The apparatus of claim 44 in which the conqiut^ 
of the prior pattern, the selected prediction heuristic using con^prises an execution means, the execution means for 
the pattem count to generate a prediction of whether the executing the instruction and for outputting a branch ded- 
tffanch will be taken. sion indicating whether the branch specified by the instmc- 

37. The system of claim 35 fiirther comprising an update tion was taken, the ^paratus further comprising update 
unit to update the pattem count according to the branch 65 means for s updating the global sequence, the global count, 
decision. the current pattem, and the prior pattern according to the 

38. Tlie system of claim 35 further con$)rising: branch decision. 
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46. The apparatus of claim 40 in which the cuaent pattern 
and the prior pattern are input to each predicticD means of 
the plurality of prediction means, the output of the selected 
prediction means being the prediction of whether the branch 
will be taken. 5 

47. The apparatus of claim 40 in which the current pattern 
and the prior pattern are input to only the selected prediction 
means, the output of the selected prediction means being the 
prediction of whether the branch will be taken. 

48. An apparatus to determine the prediction heuristic lO 
identified by a branch instruction in a computer program, the 
apparatus convulsing: 

means for locating the branch instruction and adding 
profiling instructions for the branch instruction to the 
program, the profiling instructions, when executed, for 15 
con5)uting history information for the branch instruc- 
tion and for inputting the history information to a 
plurality of prediction means, each prediction means 
for outputing a prediction of whether the branch 
instruction will be taken; 20 

means for executing the program with a sample data set» 
the execution of the program causing execution of the 
profiling instructions, the execution of the program 
further causing execution of the branch instruction, the 
execution of the branch instruction comprising a branch 
decision; and 

means for comparing the output of eadh prediction means 
from the plurality of prediction means to the branch 
decision, die comparator means identifying a prediction ^ 
means from the plurality of prediction means which 
most accurately predicts the branch decision for the 
branch instruction. 

49. The apparatus of claim 48 furtiier couqxrising means 
for removing the profiling instructions from the program, the 
means for removing profiling instructions further adding to 
the branch Instruction an identification of the prediction 
means whid) most accurately predicts the branch decision 
for the branch instruction. 

50. A system f cx" predicting whether a branch specified by ^ 
an instruction in a computer program will be taken during 
execution of the program by a computer, the instmction 
configured to cause selection fiom a distinct plurality of 



prediction means of a prediction means for predicting 
whether the branch specified by the instruction will be taken, 
the system comprising: 

means for storing a current pattern conaprising a numba 
of consecutive identical branch decisions for the 
instruction, the means for storing further storing a prior 
pattern comprising a number of consecutive identical 
prior branch decisions for the instruction, the prior 
branch decisions occurring prior to the branch deci- 
sions comprised by the current pattern; 

a circuit including a plurality of prediction means com- 
prising the selected prediction means, the circuit con- 
figured to use the plurality of prediction means that use 
the current pattern and the prior pattern for generating 
a prediction of whether the branch will be taken; and 

means for executing the histruction, the execution means 
for ou^utting a branch decision indicating whether the 
hrandi specified by the instruction was taken. 

51. The system of claim 50 in which the means for stcHing 
further stores a pattern count, the pattern count conqrising 
a count of occurrences of the prior pattern, the selected 
prediction means using the pattern count to generate a 
prediction of whether the branch will be taken, 

52. The system of daim 50 further comprising update 
means for updating the pattern count according to the branch 
decision. 

53. The system of daim 50 wherein the means for staring 
further storing a ^obal sequence, the global sequence com- 
prising a number of branch decisions for a plurality of 
instructions, the means for storing further storing a global 
count, the global count conqaising a count of occurrences of 
the global sequence, the s^aratus further conqirising circuit 
means including global prediction means using the global 
sequence and the global count for generating a prediction of 
whether the branch will be taken. 

54. The system of daim 53 further comprising update 
means for updating the global sequence, the global count, 
the current pattern, and the prior pattern according to the 
branch dedsion. 
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